Reinterpretation of anthocyanins biosynthesis in developing black rice seeds through gene expression analysis

The biosynthesis of anthocyanins is still questionable in regulating the quantities of anthocyanins biosynthesized in rice seeds and the expression levels of transcription factors and the structural genes involved in the biosynthetic pathway of anthocyanins. We herein investigated the relationship between the accumulated anthocyanin contents and the expression levels of genes related to the biosynthesis of anthocyanins in rice seeds. Liquid chromatography/mass spectrometry-mass spectrometry analysis of cyanidin 3-glucoside (C3G) in rice seeds showed no accumulation of C3G in white and red rice cultivars, and the differential accumulation of C3G among black rice cultivars. RNA-seq analysis in rice seeds, including white, red, and black rice cultivars, at twenty days after heading (DAH) further exhibited that the genes involved in the biosynthesis of anthocyanins were differentially upregulated in developing seeds of black rice. We further verified these RNA-seq results through gene expression analysis by a quantitative real-time polymerase chain reaction in developing seeds of white, red, and black rice cultivars at 20 DAH. Of these genes related to the biosynthesis of anthocyanins, bHLHs, MYBs, and WD40, which are regulators, and the structural genes, including chalcone synthase (CHS), flavanone 3-hydroxylase (F3H), flavonoid 3´-hydroxylase (F3´H), dihydroflavonol 4-reductase (DFR), and anthocyanidin synthase (ANS), were differentially upregulated in black rice seeds. The correlation analysis revealed that the quantities of C3G biosynthesized in black rice seeds were positively correlated to the expression levels of bHLHs, MYBs and WD40, CHS, F3H, F3´H, DFR, and ANS. In addition, we present bHLH2 (LOC_Os04g47040) and MYBs (LOC_Os01g49160, LOC_Os01g74410, and LOC_Os03g29614) as new putative transcription factor genes for the biosynthesis of anthocyanins in black rice seeds. It is expected that this study will help to improve the understanding of the molecular levels involved in the biosynthesis of anthocyanins in black rice seeds.

It was first known in Zea mays that the biosynthesis of anthocyanins is regulated by two transcription factors, including C1 [16,17], a myb gene, and R [18], a basic helix-loop-helix (bHLH) gene [19]. Orthologous genes to these two genes were also reported in rice [20,21]. The biosynthetic pathway of anthocyanins has been well elucidated in plants, as presented in Fig 1. The first step of the biosynthesis of anthocyanins is the conversion of p-coumaroyl CoA, formed from phenylalanine via stepwise reactions by phenylalanine ammonia-lyase (PAL) [22], cinnamate 4-hydroxylase (C4H) [23,24] Compared to developing seeds of white rice with no anthocyanin accumulation, CHS, F3H, DFR, and ANS were upregulated in developing black rice seeds [44] and Kala4 (LOC_Os04g47059), an orthologous bHLH gene of maize R gene, was also upregulated in black rice seeds [45]. Among black rice cultivars, there were significant differences in the quantities of anthocyanins, C3G and P3G, in mature rice seeds [46]. However, in developing seeds of black rice, there needs to be more information on the relationship between the quantities of anthocyanins biosynthesized and the expression level of genes related to the biosynthetic pathway of anthocyanins. In this study, we investigated how the biosynthesis of anthocyanins is related to the expression of genes involved in the biosynthetic pathway of anthocyanins in developing seeds of black rice.

Growth of rice cultivars used in this study
We transplanted and cultivated three replicates of seedlings of Dongjin (white rice), Geonganghongmi (red rice), Jeokjinju (red rice), Boseokheukchal (black rice), Heukjinju (black rice), Heukjinmi (black rice), and Heukseol (black rice) in the experimental paddy field of the National Institute of Crop Science (NICS), Republic of Korea, by a completely randomized design using the standard rice cultivation method of the NICS. We sampled developing seeds from one panicle per plant and four plants per replicate at twenty days after heading (DAH), and harvested mature seeds at around sixty DAH.

Total RNA extraction and RNA-seq
We extracted total RNA from the frozen and milled samples with three replicates of developing seeds of Dongjin, Geonganghongmi, Jeokjinju, Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol at 20 DAH using the RNeasy Plant Mini Kit (QIAGEN, Hilden, Germany) with the manufacturer's instructions. Of these total RNA samples, we sent each one replicate of the total RNA samples of Dongjin (white rice), Jeokjinju (red rice), and Heukseol (black rice) to Macrogen, Inc (Seoul, Republic of Korea) for RNA-seq using the Illumina technology by paired-end type sequencing with 101-bp read length.
We processed the raw data of RNA-seq by the methods described by Lee et al. [48]. Briefly, the raw data were quality trimmed using the Cutadapt software with parameters: -a AGATCGGAAGAGC-A AGATCGGAAGAGC-q 30 -m 20 [49], and the trimmed data were mapped to a reference rice genome, MSU7 (http://rice.uga.edu/) using the HISAT2 software [50] with default parameter (S1 Table). Read counts data were calculated with the feature-Counts software [51] (S2 Table). We finally obtained normalized read counts data from the processed raw data of RNA-seq by division of the read counts of all genes with those of the OsUBI1 gene (LOC_Os03g13170) (S2 Table). The raw data of RNA-seq are available at https:// www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-9993.

Gene expression analysis by quantitative real time-polymerase chain reaction
As described by Lee et al. [48], we synthesized cDNA using the iScript TM cDNA Synthesis Kit (Bio-Rad, Hercules, CA, USA) from 1 μg of each total RNA taken in total RNA samples with three replicates of Dongjin, Geonganghongmi, Jeokjinju, Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol. We carried out quantitative real time-polymerase chain reaction (qRT-PCR) with cDNA and the primer sets listed in the S3 Table in the CFX96 TM Real-Time Detection System (Bio-Rad, Hercules, CA, USA) using iQ SYBR Green Supermix (Bio-Rad, Hercules, CA, USA). We used the OsUBI1 gene, LOC_Os03g13170, as a reference gene (S3 Table) [52][53][54]. We used the Pfaffl [55] method to determine the relative expression of genes described in the S3 Table.

Quantification of cyanidin 3-glucoside in seeds of black rice cultivars through LC/MS-MS
The National Institute of Crop Science (NICS) has developed and released fourteen black rice cultivars (S3 Table). Of these fourteen black rice cultivars, we selected four black rice cultivars, including Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol, with consideration of quantities of C3G in hulled seeds and original parent(s) used in the breeding program for improvement of black rice traits. Interestingly, the genome of Heukjinmi partially retains the genomic content of Hongjinju, a red rice cultivar, as one of the parents in its breeding pedigree (S4 Table, http://www. nics.go.kr/api/breed.do?m=100000128&homepageSeCode=nics). We also selected one white rice cultivar, Dongjin, and two red rice cultivars, Geonganghongmi and Jeokjinju, as a control to compare the metabolic differences with black rice cultivars from the database for cultivars developed by the NICS (http://www.nics.go.kr/api/breed.do?m=100000128&homepageSeCode=nics).
We quantified C3G from the hulled seeds of seven rice cultivars, including Dongjin, Geonganghongmi, Jeokjinju, Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol, through liquid chromatography/mass spectrometry-mass spectrometry (LC/MS-MS) (Fig 2). Of seven rice cultivars used in this study, C3G was detected only in the hulled seeds of black rice cultivars. Based on the quantities of C3G, four black rice cultivars are statistically classified into two groups; Boseokheukchal and Heukjinmi; Heukjinju and Heukseol. The quantities of C3G in the hulled seeds of Heukjinju and Heukseol were significantly higher than those in the hulled seeds of Boseokheukchal and Heukjinmi (Fig 2).

Genes, involved in the biosynthetic pathway of anthocyanins, detected in developing black rice seeds through RNA-seq
We confirmed that the quantity of C3G and P3G maximally accumulated in the seeds of Heuknam and Heukseol, black rice cultivars, at around 20 DAH (Lee et al., unpublished), which corresponds to a previous report [57], and collected developing seeds of Dongjin, Geonganghongmi, Jeokjinju, Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol at 20 DAH for gene expression analysis. To check the overall expression patterns of genes putatively involved in the biosynthetic pathway of anthocyanins, including bHLH [21,45], MYB [20,58], WD40 [59], PAL [22], C4H [23,24] [60], CCR [61,62], CAD [63], and LAR [64], in developing seeds of white, red, and/or black rice at 20 DAH, only one replicate of total RNA samples of Dongjin, Jeokjinju, or Heukseol was chosen for RNA-seq through Illumina sequencing. These RNA-seq data were used to identify potential candidate genes involved in the biosynthetic pathway of anthocyanins.
We verified that ANS (LOC_Os01g27490) showed black rice-specific expression patterns in seeds at 20 DAH, but it showed no expression in white rice (Dongjin) and red rice (Geonganghongmi and Jeokjinju) seeds (Fig 4), as shown in S5 Table. Among black rice cultivars, there were statistical differences in the expression of ANS (LOC_Os01g27490) (Fig 4). Moreover, we investigated the expression levels of LAR (LOC_Os03g15360), which converts leucocyanidin into catechin, one of the red rice-specific compounds [64,66], in developing seeds of Dongjin, Geonganghongmi, Jeokjinju, Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol at 20 DAH, because Heukjinmi (black rice) has red rice as one of the parents as mentioned above. Of these rice cultivars, Geonganghongmi (red rice) has the maximum expression of LAR (LOC_Os03g15360) in its seeds, and Jeokjinju (red rice), Boseokheukchal (black rice) and Heukjinmi (black rice) also showed statistically significant upregulation of LAR (LOC_Os03g15360) in their seeds, compared to Dongjin (white rice), Heukjinju (black rice), and Heukseol (black rice) (Fig 4).

Correlation analysis between the quantity of cyanidin 3-glucoside and the expression level of each gene involved in the biosynthetic pathway of anthocyanins in developing rice seeds
We performed a correlation analysis in Dongjin, Geonganghongmi, Jeokjinju, Boseokheukchal, Heukjinju, Heukjinmi, and Heukseol, to investigate the statistical relationship in the

PLOS ONE
quantity of C3G in hulled rice seeds and the expression levels of each gene putatively involved in the biosynthetic pathway of anthocyanins in rice seeds at 20 DAH (Fig 5 and S6 Table). The quantity of C3G in hulled rice seeds is positively correlated to the expression of these genes, including bHLH1 (LOC_Os04g47059), bHLH2 (LOC_Os04g47040), MYB (LOC_Os01g49160), WD40 (LOC_Os02g45810), CHS (LOC_Os11g32650), F3H (LOC_Os04g56700), F3´H (LOC_Os10g17260), DFR (LOC_Os01g44260), and ANS (LOC_Os01g27490), preferentially upregulated in black rice seeds as described in Figs 3 and 4. These genes also had a positive correlation between their expression values. Interestingly, the expression of CHI (LOC_Os03g60509) did not correlate with the quantity of C3G. However, it positively correlated with the expression of genes preferentially upregulated in black rice seeds mentioned above. In addition, the expression of CCR (LOC_Os09g25150), which shares p-coumaroyl CoA as a precursor with CHS (LOC_Os11g32650) [26,27,61,62], was negatively correlated with the quantity of C3G (Fig 5 and S6 Table).

PLOS ONE
Kim et al. described [46], any detectable C3G was not identified in seeds of Dongjin, a white rice cultivar, and Geonganghongmi and Jeokjinju, red rice cultivars, and, in black rice, two groups showed statistically different quantities of C3G (Fig 2).
As reported in maize, the biosynthesis of anthocyanins was regulated by two transcription factors: R, a bHLH gene, and C1, a myb gene [16][17][18][19]. The overexpression of C1 and B-Peru, a

PLOS ONE
bHLH gene, resulted in the biosynthesis of anthocyanins in developing white rice seeds [21]. Furthermore, the overexpression of maize C1 and rice bHLH gene [OSB1 (AB021079, LOC_Os04g47080) or OSB2 (AB021080, LOC_Os04g47059)] resulted in the accumulation of anthocyanins in developing white rice seeds. However, no anthocyanin accumulation occurred in developing rice seeds upon the overexpression of only one gene of C1, B-Peru, OSB1, or OSB2. These results suggested that, in developing rice seeds, the overexpression of R, bHLH gene, and C1, myb gene, is essential for the biosynthesis of anthocyanins [21]. The Kala4 (LOC_Os04g47059), an essential bHLH gene involved in the biosynthesis of anthocyanins in rice seeds, was upregulated in developing rice seeds more than in those white rice seeds [45]. The overexpression of Kala4 led to the accumulation of anthocyanins in near-isogenic rice lines with Kala3, a myb gene functionally expressed and without Kala4 being functionally expressed [45,67]. In our RNA-seq data at 20 DAH rice seeds, we identified fifty-four bHLHs and seventy-five MYBs, and, of them, three bHLHs, including LOC_Os01g09990, LOC_Os04g47040 (bHLH2), and LOC_Os04g47059 (bHLH1; known as OSB2 or Kala4), and 3 MYBs, including LOC_Os01g49160 (MYB), LOC_Os01g74410, and LOC_Os03g29614, were differentially upregulated in developing black rice seeds. This indicated that those genes are putatively involved in the biosynthesis of anthocyanins in developing seeds of black rice (S5 Table). Interestingly, the MYB gene (Y15219, LOC_Os06g10350) homologous to maize C1 reported by Reddy et al. did not show black rice-specific expression patterns in developing seeds [20], thereby indicating that there are other putative functional MYBs, involved in the biosynthesis of anthocyanins, with seed-specific expression patterns (S5 Table).
In addition, WD40 (LOC_Os02g45810) was reported to regulate the biosynthesis of anthocyanins in Arabidopsis thaliana [59] and rice [68] and was differentially upregulated in black rice seeds (Fig 3B). However, in contrast to bHLH and MYB, WD40 might be functionally expressed in developing white rice seeds because Sakamoto et al. showed the accumulation of anthocyanins in seeds of white rice by the overexpression of bHLH and MYB, but not with WD40 [21].
The structural genes, including CHS (LOC_Os11g32650), F3H (LOC_Os04g56700), F3´H (LOC_Os10g17260), DFR (LOC_Os01g44260), and ANS (LOC_Os01g27490), involved in the biosynthetic pathway of anthocyanins, were differentially upregulated in developing black rice seeds (S1 Fig), compared to the expression data in Nipponbare, a white rice cultivar, with no such upregulation as mentioned above (S2 Fig). Further verification of RNA-seq data by qRT-PCR exhibited that, of the structural genes in the biosynthetic pathway of anthocyanins, CHS (LOC_Os11g32650), F3H (LOC_Os04g56700), F3´H (LOC_Os10g17260), and DFR (LOC_Os01g44260) were differentially upregulated in developing seeds of black rice (Fig 3C). Moreover, ANS (LOC_Os01g27490), which converts leucoanthocyanidin into anthocyanidin [39][40][41], was expressed only in developing seeds of black rice but not in seeds of white and red rice (Fig 4A and S5 Table). However, there are remaining questions about the switch-on system for expressing ANS in black rice seeds, and it is required to carry out further investigations for this issue.
Catechin is converted from leucocynidin, a precursor shared by LAR [64] and ANS [39-41], by the reaction of LAR [64], and procyanidins were polymerized from catechin [64]. They were biosynthesized in seeds of red rice but not in white and black rice seeds [66]. As mentioned above, the expression of LAR (LOC_Os03g15360) was significantly upregulated in the seeds of Heukjinmi, which was crossed with red rice, as in the seeds of red rice cultivars, Geonganghongmi and Jeokjinju (Fig 4B). However, it was also significantly upregulated in the seeds of Boseokheukchal, compared to Dognjin (white rice) and the other two black rice cultivars, Heukjinju and Heukseol, indicating that higher expression of LAR is putatively related to the reduction of C3G biosynthesized in black rice seeds (Figs 2 and 4B). For Boseokheukchal, it is necessary to investigate whether red rice is incorporated as a parent. These expression analysis data are closely related to the C3G quantities in four black rice cultivars, divided into the first group of Heukjinju and Heukseol with significantly higher C3G content, and the second group of Boseokheukchal and Heukjinmi with significantly lower C3G content (Fig 2).
Furthermore, correlation analysis between C3G contents in hulled rice and the expression level of genes involved in the biosynthesis of anthocyanins revealed that the quantities of anthocyanins biosynthesized in black rice seeds are positively correlated to the expression level of bHLH1 (LOC_Os04g47059), bHLH2 (LOC_Os04g47040), MYB (LOC_Os01g49160), WD40 (LOC_Os02g45810), CHS (LOC_Os11g32650), F3H (LOC_Os04g56700), F3´H (LOC_Os10g17260), DFR (LOC_Os01g44260), and ANS (LOC_Os01g27490), respectively (Fig 5 and S6 Table). However, there is still little doubt about the biosynthesis of anthocyanins in rice seeds because we carried out this study with limited information and just a few black rice cultivars. Therefore, with more black rice cultivars and with much deeper details of transcriptomic and genomic data, further studies are required to perfectly understand the biosynthesis of anthocyanins in developing seeds of black rice, thereby will resulting in establishment of database for gene expression of each gene related to the biosynthetic pathway of anthocyanins in developing seeds in various black rice cultivars. Furthermore, the results from further studies can efficiently and powerfully be utilized in rice breeding programs to improve the anthocyanin content in seeds. In addition, as shown in Figs 3 and 4, red rice cultivars exhibited very unique expression data, compared to those in white and black rice cultivars. Further study is also needed for more understanding of the biosynthetic pathway of proanthocyanidins in developing seeds of red rice through transcriptomic and genomic analysis tools.

Conclusion
In this study, we elucidated that the C3G contents biosynthesized in black rice seeds positively correlate to the expression levels of genes, including bHLH1, bHLH2, MYB, WD40, CHS, F3H, F3´H, DFR, and ANS. In addition, compared to those of white and red rice cultivars, several genes that regulate the biosynthesis of anthocyanins, including bHLHs, MYBs, and WD40, were highly upregulated in developing seeds of black rice cultivars, and the structural genes in the biosynthetic pathway of anthocyanins, including CHS, F3H, F3´H, DFR, and ANS, were also differentially upregulated in black rice seeds. Moreover, we report new candidate transcription factor genes, bHLH2 (LOC_Os04g47040), and MYBs (LOC_Os01g49160, LOC_Os01g74410, and LOC_Os03g29614), with black rice seed-specific expression patterns, for the biosynthesis of anthocyanins in black rice seeds.